Background

Myelodysplastic Syndromes (MDS) are clonal hematopoietic stem cell disorders that evolve following a multi-step process. This process begins with a primary driver mutation in a hematopoietic stem cell that gives rise to clonal hematopoiesis and progressive clonal expansion. Understanding the timing of driver mutations and their association with secondary mutations can uncover evolutionary trajectories. These trajectories are identified by detecting co-occurring mutations with a significant temporal order, where an early mutation increases the probability of a later mutation. MDS heterogeneity is not only clonal but also clinical, and precise risk stratification is essential for clinical decision-making. Prior works have shown that clonal evolution correlates with Overall Survival (OS) in cancer. Building on these insights, we developed ProgEvo, a machine learning framework that infers molecular evolutionary features and integrates them with clinical and molecular covariates to predict clinical outcomes.

Methods

ProgEvo training was performed on 2,519 patients from the original IPSS-M cohort (data from cBioPortal) to develop both the evolutionary model and the prognostic model. Cancer cell fractions were estimated from variant allele frequencies corrected for copy number to infer the temporal sequence of driver mutations within each patient. These individual temporal sequences were then aggregated into a cohort-level graph. Recurrent evolutionary routes were identified within the evolutionary graph through a maximum likelihood approach with AIC-based model selection. Evolutionary validation was conducted on an independent cohort of 2,043 GenoMed4All patients. Features consistently associated with leukemia-free survival (LFS) were integrated into the IPSS-M model, minimizing structural changes of IPSS-M and the number of additional covariates. Clinical validation of the prognostic model was performed in both the GenoMed4All and Moffitt Cancer Center (MCC) cohorts (2,157 patients).

Results

We selected 46 genes consistently sequenced across training and testing datasets. In the cBioPortal cohort, 7,828 mutations were analyzed. Genes were initially assigned a discrete temporal rank (1–2–3) based on their aggregated position in the evolutionary graph, with earlier ranks representing earlier acquisition. A continuous rank score was then derived for each gene through bootstrap resampling of patient-level temporal sequences. 1° rank genes included KMT2D, NOTCH1, ATRX, CREBBP, KIT, and ATRX. The 2° rank was composed by the backbone of initial driver mutations: DNMT3A, SF3B1, SRSF2, TET2, TP53, U2AF1, ASXL1, and EZH2. Late events (3° rank) comprised KRAS, STAG2, RUNX1, CBL, PPM1D, and NRAS. Gene co-occurrence alone was not considered sufficient to define evolutionary relationships. A directional route was inferred only when a “parent” gene consistently preceded the “child” gene in the evolution graph. In total, 1,765 gene co-occurrences were aggregated into 45 directional evolutionary routes; 18 were validated in the GenoMed4All cohort. Several routes were significantly associated with LFS in univariate analysis, including: ASXL1→KRAS (HR 2.87, CI 1.72–4.78), ASXL1→STAG2 (HR 2.51, CI 2.04–3.08), DNMT3A→BCOR (HR 1.95, CI 1.33–2.86), SF3B1→RUNX1 (HR 1.84, CI 1.26–2.68), SRSF2→NRAS (HR 2.84, CI 1.64–4.92), SRSF2→STAG2 (HR 2.24, CI 1.76–2.85), TET2→STAG2 (HR 2.72, CI 2.03–3.66). Two validated directional routes (ASXL1→KRAS, SRSF2→NRAS) and two early genes (ATRX, JAK2), independently associated with LFS outside the original IPSS-M model, were incorporated into IPSS-M to generate the IPSS-M-Evo score. The model was adjusted for age and co-occurrence of NRAS and RUNX1. IPSS-M-Evo improved prognostic discrimination for both LFS and OS, with higher c-indexes (0.76 vs 0.75), lower AIC (ΔAIC 109 for OS and 77 for LFS), and reclassification of over 40% of patients. Clinical performance was independently validated in both the GenoMed4All and MCC cohorts.

Conclusions

By resolving co-occurrence into directional evolutionary trajectories, ProgEvo reduces the number of co-mutations to evaluate by prioritizing temporally consistent events. Integrating evolution-informed features into IPSS-M (IPSS-M-Evo) improves risk stratification. Bringing evolution into clinical practice may enable personalized NGS monitoring, anticipatory or more personalized therapy, and potentially drug discovery based on molecular evolution.

This content is only available as a PDF.
Sign in via your Institution